Usability Blogs

To Buy or Not To Buy Text Link Ads

August 31st, 2005

by Stephan Spencer

A few weeks back I blogged some advice here for business bloggers who might want to consider text link advertising as part of their blog marketing mix.

Well, there’s been a lot of controversy as of late about buying text links. Blogger Phil Ringnalder published a scathing post accusing publishing house O’Reilly of being a search engine spammer. O’Reilly’s founder, Tim O’Reilly, responded to the accusations on his own blog. Google engineer Matt Cutts posted a comment to Tim’s post admitting that Google has decreased the voting power of sites like perl.com and xml.com and downgraded the reputation of some of their outbound links. Ouch!

Matt’s (and presumably Google’s) position was loud and clear:

If you don’t want your own site to suffer the same fate as O’Reilly, you better tag your link ads with a rel=nofollow attribute so that you don’t pass any PageRank score to your advertisers.

In my mind, that doesn’t seem quite fair. Website owners and bloggers work hard to build a content-rich site with good PageRank score. Google’s black-or-white stance on this equates to a diminished earning ability for these websites by insisting webmasters cut off the flow of PageRank to their advertisers. This of course decreases the value of the link ads to those advertisers, and consequently the revenue likely to be realized from them. Granted, no savvy advertiser is going to buy a text link ad solely based on PageRank score, but PageRank does factor into the equation.

This makes me wonder what Google’s position is on BlogAds.com is, which is part banner ad, part text link ad. A good blog ad contains useful content. Why shouldn’t the blogger be allowed to “vouch for” (by not tagging the link with nofollow) the links contained within that ad if they so choose?

Most “white hat” SEOs such as Christine Churchill believe text link advertising is a legitimate practice. I agree with her.

I wonder what Google would do if all the websites across the Internet decided to take all their banner ad inventory they have and bypass the click-tracker redirect that counts all the clickthroughs. Suddenly all these new votes would start counting all over the Internet for commercial advertisers and sponsors. Wouldn’t that throw Google for a loop!

So what is the bottom line here for bloggers who are looking to advertise? It’s basically this: be discriminating in your link buying. Text link advertisements are not inherently evil. Just don’t buy ads on sites where any of the other advertisers on the site are misleading, deceptive or misrepresentative. By that, I mean things like the following:

  1. Setting the ad’s link text to some keyword-rich phrase that doesn’t accurately reflect the page that is linked to.
    e.g. An ad on SeacoastOnline.com proclaims “The North Face” but that isn’t The North Face!
  2. Linking the ad text to a landing page that is built for search engines and not for people.
    e.g. the “Discount Vacations” ad on DailyItem.com points to one of Orbitz’s many “doorway pages”.
  3. Hiding or obscuring the link so human visitors can’t see it, only search engines.
    e.g. Doing a “View Source” on the home page of PRNewswire.com reveals these hidden links:

    </noframes>
    <a href=”http://www.icrossing.com”>Search Engine Marketing</a>
    <a href=”http://sev.prnewswire.com”>Search Engine News Release Optimization</a>
    </frameset>

And it goes without saying that you should refrain from such practices yourself when you advertise.

This post is based on material taken from on my own blog across three separate posts: Link buying - ethical or unethical?, Buying links - Google’s perspective, and Buying link ads - the ethical debate rages.

Spread the word: delicious this:To Buy or Not To Buy Text Link Ads digg this:To Buy or Not To Buy Text Link Ads spurl this:To Buy or Not To Buy Text Link Ads furl this:To Buy or Not To Buy Text Link Ads reddit this:To Buy or Not To Buy Text Link Ads Add to Y!:To Buy or Not To Buy Text Link Ads

Coverage of SES San Jose: Favorite SEO Tools

August 11th, 2005

by Stephan Spencer

Here we are, the last session of Search Engine Strategies. It’s been a great, but exhausting conference. The session I attended was on SEO Tools. Three of the five panelists provided their Powerpoints on their websites (just so happens they were the three best presentations), which you should definitely check out because they show screenshots of these tools in action. Download the first two Powerpoints from www.webuildpages.com/ses and the third from www.epiar.com/ses.

Jim Boykin:
Wayback Machine
Find Age of Website Tool
Poodle Predictor (spider simulator)
Copyscape (website plagiarism search)
URLinfo
Backlink Anchor Text Analyzer
KwMap (a keyword map for the whole Internet)
Hubfinder (looks for co-occurring backlinks, which may be authoritative links that help satisfy topic dependant link authority algorithms. To use Hubfinder enter a subject, and / or competing URLs to analyze linkage data of top ranked competing sites via the Yahoo! API.)
Keyword Tracker

Todd Malicoat:
Domain/server level information: Whois Source, DNS Stuff, and Check Class C IP Address (this last one is to make sure the links that you plan on buying are on different class C blocks)
Competitive information tools: GoogSpy, SwitchProxy extension for Firefox
Backlinks & offpage information tools: Pages Indexed, Backlinks Domain, PageRank, Allinanchor, Keyword Density tool, Yahoo! Link Harvester
Keyword information: Google Sets, Keyword Density tools, Google Suggest, Snap.com Keyword Stats
Header & page level information: Server Header Checker
Spidering & indexability: Xenu’s Link Sleuth, Sandbox Detection Tool

Ken Jurina:
Firefox extensions: SEOpen, Web Developer, Search Status, PDF Download, Roboform toolbar, Search Keys, IE View (all downloadable from http://extensionroom.mozdev.org)
Web CEO
Click Tracks
LiveSTATS
Roboform
Marketleap Link popularity check, Search engine saturation, Keyword verification

Bill Hartzer:
OptiLink
OptiSpider
Keyword Combinations
Keyword Helper
URL Trends domain analyzer (it also supports notifying you via email or RSS when changes happen)
Sources of other tools: www.seocompany.ca/tool/seo-tools.html, www.digitalpoint.com/tools/, www.seotoolset.com, www.seochat.com/seo-tools

Paul Bruemmer:
Alexa
RankingManager
Linxviewer
Yahoo! Finance
Hoovers Pro Plus
Print Screen Plus

Well I wanted to blog many more sessions than I did, but it ended up being a lot harder than I thought it would be. Thankfully for you, dear readers, there were many other capable bloggers blogging the SES sessions. In particular check out the coverage on Search Engine Roundtable blog.

By the way, a big hello to all the bloggers I met for the first time at SES, including Scott Miller, Aaron Wall, and Barry Schwartz, to name a few.

Spread the word: delicious this:Coverage of SES San Jose: Favorite SEO Tools digg this:Coverage of SES San Jose: Favorite SEO Tools spurl this:Coverage of SES San Jose: Favorite SEO Tools furl this:Coverage of SES San Jose: Favorite SEO Tools reddit this:Coverage of SES San Jose: Favorite SEO Tools Add to Y!:Coverage of SES San Jose: Favorite SEO Tools

Coverage of SES San Jose: Search Engine Q&A On Links

August 10th, 2005

by Stephan Spencer

I’m a bit behind on my conference session blogging. Waaay too many parties going on; doesn’t leave much time for blogging. The Google Dance last night. Yahoo! party at Great America the night before. And tonight I’ve got another party to go to. Yesterday I spoke on RSS. I’ll post a recap on that session later.

I just attended “Search Engine Q&A On Links”, which was great. Lots of useful advice from Google and Yahoo! about linking (nobody seemed to want to ask poor Ask Jeeves any questions). It was funny how obviously diametrically opposed the engines were to the immediately prior session on “Buying and Selling Links”. It’s hard to reconcile the two different sets of advice. Matt in the hallway before this session was adamant: “Don’t buy links!”

Anyways, without any further ado, here’s the session recap:

Kaushal Kurapati from Ask Jeeves:
Be cautious of: reciprocal links and purchasing links
Avoid: link farms, cloaking pages, invisible or hidden links that trick the crawler
Become an authority on a subject
Focus on your busines and content. Rest will follow. [I say: “yeah, right…”]
Teoma uses subject specific popularity: garner respect in your industry, subject-specific text based links can be understood. (hubs and authorities model)

Tim Mayer from Yahoo!:
Here’s some important news!! Yahoo! has just launched a brand new service: Site Explorer from Yahoo! Search. Stop scraping the Yahoo site for backlink results and use Site Explorer instead. Access via an API is offered too. And you can export as a CSV file.
Yahoo has 19.2 billion web objects in its index. Over 20 billion objects, when you include the audio and video.
Plans to use community to improve search quality. Social search = within a trusted network, where someone within your network vouches for a site.
Create natural linking strategies. when things start to look unnatural, is when you’ll start getting into trouble. We look at intent (linking to plasma TVs, diamonds, and Viagra all on the same page) and extent (i.e. what looks normal. Having everything on the page as links or 200 links on the page is too much!)
Yahoo! offers a much more comprehensive sample of backlinks than Google, but not a complete set of backlinks. New system (Site Explorer) will be reasonably comprehensive, in his opinion the most comprehensive out there.
It’s unnatural to link to sitemap-1 sitemap-2 sitemap-3 sitemap-4 sitemap-5. If you are doing this, you’re headed in the wrong direction.

Matt Cutts from Google:
Good links are earned links, links that are based on editorial discretion.
Create services that really useful. e.g newsletters, an article a day, syndicate through RSS (attribute my article and give me a link). start a blog.
Matt launched his blog today: mattcutts.com
Think outside the box.
Only SEOs and librarians do backlink searches. Historically we decided to dedicate a subset of our servers to backlinks. Only a sampling of backlinks would be displayed but only for a threshold of PageRank 4 or higher pages. A suggestion was made to show backlinks for lower PageRank pages too. We liked that idea so we now show a random sampling of backlinks, including low PageRank scoring pages too. We show twice as many backlinks as shown before, but still it’s only a sampling of the backlinks.
In graph theory, a clique in every node in the graph is very unnatural. So don’t link to every single node in your network of sites; it’ll get flagged.
For dynamic sites, you’re very safe if you have fewer than 2 parameters; keep the values of those parameters to fewer than 5 digits, and don’t name a parameter “id”. Googlebot sometimes tries variations of URLs by dropping parameters, but we only do that deep level analysis on big, quality sites.
Another good approach that alltheweb came up with: spider would always go 1 dynamic page deep from a static page.
Search engines only grab 100k or 200k or 500k so be careful loading up a huge page with a lot of links.
PageRank isn’t as important as SOME people make it out to be. BUT it’s NOT like “PageRank? Oh yeah let’s shuffle that one under the rug! That was sooo 4 years ago!”
“BO” = backlink obsession
We export PageRank only once every 3 months or so.

Technorati tag: Search Engine Strategies

Spread the word: delicious this:Coverage of SES San Jose: Search Engine Q&A On Links digg this:Coverage of SES San Jose: Search Engine Q&A On Links spurl this:Coverage of SES San Jose: Search Engine Q&A On Links furl this:Coverage of SES San Jose: Search Engine Q&A On Links reddit this:Coverage of SES San Jose: Search Engine Q&A On Links Add to Y!:Coverage of SES San Jose: Search Engine Q&A On Links

Coverage of SES San Jose: Search Algorithms, The Patent Files

August 8th, 2005

by Stephan Spencer

I attended the “Search Algorithms: The Patent Files” session first thing this morning. The panelists were Rand Fishkin, CEO of SEOmoz.org, Ani Kortikar, Founder and CEO, Netramind, Dr. E. Garcia of Mi Islita.com, and Jon Glick, Senior Director of Product Search, Become.com. My favorite presentation was from Jon. He was not overly technical (Dr. Garcia lost me at the advanced mathematics talking about calculating dot products of vectors) yet he gave solid advice. Here’s what he had to say, in summary:

Take these patents with a grain of salt, because…
- patent applicants don’t need to use all the stuff they include in a patent application.
- patent applicants don’t have to disclose all of its features in a patent application.
- and they recognize that SEOs and their competitors are pouring over their patent apps.

With that said, there are some valuable learnings from the 2003 Google patent. Search engines may take into account: CTR on your page in SERPs, rapid changes in content, rapid growth of in-links, and length of time users spend on your site.

So which of these actually impact your rankings? Some are red herrings, such as:
- Clickthrough rate (CTR): it’s too easy to distort (e.g. through clickbotting, which is evil and likely to get you penalized). Probably CTR is used for demotion only. In other words, high CTR won’t help your organic rankings, but low CTR may lower your rankings.
- Time spent on a site: when users hit the back button almost immediately, it can signify an irrelevant page or 404 error. However, if this was used then this would in effect reward black hat tactics like mousetrapping and endless pop-ups — tactics that trap users within a site.
- Rate of change in content: Most recent crawl date, last time the content changed, registration date, and first crawl date mostly impacts crawl frequency, not ranking. Duplicate detection technologies are used to find meaningful changes in site content. Meaningful changes in site content do not include putting today’s date or today’s weather on the page — it doesn’t help rankings. When a site changes its IP address, it is often re-evaluated because it is possibly under new ownership.

According to Jon, what’s not a red herring is:
- Rate of change in links: Most Search Engines limit how quickly a site can gain connectivity (sandboxing, link aging). A sudden jump in in-links (e.g. from link farming and interlinking and triangle linking lots of domains) can draw scrutiny. There are exceptions for ?ĺspike?Ĺ sites (editorial review, lots of accompanying news/blog posts, lots of web searches).

Spread the word: delicious this:Coverage of SES San Jose: Search Algorithms, The Patent Files digg this:Coverage of SES San Jose: Search Algorithms, The Patent Files spurl this:Coverage of SES San Jose: Search Algorithms, The Patent Files furl this:Coverage of SES San Jose: Search Algorithms, The Patent Files reddit this:Coverage of SES San Jose: Search Algorithms, The Patent Files Add to Y!:Coverage of SES San Jose: Search Algorithms, The Patent Files

Coverage of SES San Jose: Earning from Search & Contextual Ads

August 8th, 2005

by Stephan Spencer

Hello from sunny San Jose. I’m at the Search Engine Strategies conference - THE place to be if you care about search. I’m going to be blogging the sessions, so stay tuned over the next 4 days.

Here’s my first installment: a recap on the session I attended before lunch today on “Earning from Search & Contextual Ads”. Panelists were: Jason Calacanis, Co-Founder, Weblogs, Inc., Will Johnson, Yahoo! Search Marketing, Scott Meyer, President & CEO, About, Inc., Gokul Rajaram, Group Product Manager of Google AdSense, Google Inc. and Jen Slegg, Owner, JenSense.com.

Jen from JenSense.com started the panel off:
Jen started off by comparing and contrasting AdSense w/ Yahoo’s new YPN (Yahoo Publisher Network). Similarities include…
- very large pool of advertisers
- real time stats
- neither will tell you the revenue split
- can’t show both YPN and AdSense ads on the same page

Differences include…
with AdSense:
- 4 ads in smaller font
- international publishers ok
- offers additional tools & services
- more competition for higest paying
- multiple ad units per page
- “smart pricing” (CTR taken into account in pricing)

with YPN:
- 3 ads in a much larger font
- beta for US publishers
- only traditional ad units
- fewer publishers means less competition
- same ads on multiple units
- no smart pricing
- in future will be able to transfer your earnings to your advertising account

Many alternatives to AdSense and YPN:
- Kanoodle brightads: avg $0.35 earnings per click (EPC). 30,000 advertisers in network.
- Adsonar: thousands of advertisers
- Clicksor: avg $0.20 EPC. 4,000 advertisers running 20,000 campaigns. Will pull ads from other ad networks if insufficient clicks.
- Chitika: avg EPC $0.50
- Mirago: avg EPC .21p (approx $0.31 USD). you must invoice them. 12,000 advertisers
- ContextWeb: over 40,000 advertisers
bidclix: avg EPC 0.30. 11,000 advertisers
- Others include Miva Adrevenue xpress, Quigo, etc.
Rhetorical question from Jen: “When will MSN jump in?”

Optimizing tips:
- Placement: Bottom of page is bad. Good practice is to make link color the same as other links on the site. Anther good tactic is to place the ads on the left column where the nav usually is.
- Proximity:
- Ad unit selection: Try a variety of sizes and test.
- Ad unit colors & borders: Don’t use the standard ad unit colors / layout. Mix things up to prevent banner blindness. Try both complimentary and contrasting colors. Most sites find hidden borders yield highest CTR. like 2 or 3 times
- URL filters: Don’t do it as a way to get higher paying ads to appear. Only block your direct competitors or your own websites.

Testing:
- Use AdSense or YPN channels to track highest CTR & earnings pages. AdSense or YPN may perform better. Try both.
- Test on non-holiday weeks
- Try switching ad placement, ad unit sizes and colors
- Keep track of what works and what doesn’t
- Never assume that what works on one site will work on another.

(more…)

Spread the word: delicious this:Coverage of SES San Jose: Earning from Search & Contextual Ads digg this:Coverage of SES San Jose: Earning from Search & Contextual Ads spurl this:Coverage of SES San Jose: Earning from Search & Contextual Ads furl this:Coverage of SES San Jose: Earning from Search & Contextual Ads reddit this:Coverage of SES San Jose: Earning from Search & Contextual Ads Add to Y!:Coverage of SES San Jose: Earning from Search & Contextual Ads

Link Buying Basics for Business Bloggers

August 6th, 2005

by Stephan Spencer

Any search engine optimization consultant will tell you that links are the currency of the Web. They’re also the currency of the blogosphere. Without any inbound links, you’re just blogging to yourself. In Mike Grehan’s seminal piece “Filthy Linking Rich“, he explains how those rich with links just keep getting richer.

So how can new business bloggers get a jump start in the search engines? Simple: just whip out your wallet. The business of text link ad buying has matured, and it’s on the up-and-up. We’re not talking about “buying PageRank”… what we’re talking about is a totally legitimate business practice of buying text ads where you choose your hyperlinked words carefully based on keyword research and your advertisement appears on a reputable, relevant website. And of course, it links directly to your website, sans click tracking, so the ’search engine juice’ flows unhindered. If the practice weren’t legit, would you see such well-respected link-building pundits as Eric Ward on the board of the link broker Text-Link-Ads.com?

Buying links is not quite as simple as I make it out. Yes, you can use a broker and they’ll happily take your money. Caveat emptor! In order to make an informed purchase, you’ll need to evaluate the quality of the links using a number of criteria. Here’s such a list of criteria, courtesy of the ABAKUS SEO Blog:

  1. Inbound site traffic and page traffic.
  2. Inbound dot gov and dot edu links.
  3. Click though traffic you get from the page.
  4. Site in DMOZ and Yahoo directory.
  5. Age of domain and time of domain being used (longer the better).
  6. Inbound links shown to that page on Yahoo (link:http:www.domain.ext/page/).
  7. Ranking of page for the keywords it is optimized for.
  8. Relevance of theme of site and page to your site and page.
  9. Alexa ranking (lower is better).
  10. Deep link compared to home page links.
  11. Location of link.
  12. Length of allowed description text.
  13. PR of page (still matters a bit).

Personally, I’d also add to the list:

  1. Appearance of any link advertisers on the page that would attract the attention (negatively) of the search engines (e.g.: casinos, Texas Hold’em, Viagra, pharmaceuticals, insurance, Rolex, etc.)
  2. Quality of the landing pages of the existing link advertisers (if you find any are spammy-looking, turn and run!)
  3. Placement of the link. (i.e.: being relegated to the bottom of the page as footer links is not ideal)

Spread the word: delicious this:Link Buying Basics for Business Bloggers digg this:Link Buying Basics for Business Bloggers spurl this:Link Buying Basics for Business Bloggers furl this:Link Buying Basics for Business Bloggers reddit this:Link Buying Basics for Business Bloggers Add to Y!:Link Buying Basics for Business Bloggers

How blogging has paid off

July 19th, 2005

by Stephan Spencer

I was recently interviewed by a journalist on business blogging and its benefits. He wanted to know specifically what it’s done for me to have a blog. Here’s what I told him:

  • I’ve gotten inquiries from prospects who found Netconcepts through my blog.
  • My blog helps me get speaking gigs and PR. In fact, I recently got one of my blog entries taken verbatim by a well-respected US magazine — DM News — and published as an article.
  • It builds credibility and establishes me as a thought leader in the eyes of prospects and clients. For example, one of our recent clients choose us over a competitor for online marketing services partly because of my blog.
  • It’s helped upsell existing clients on additional services, as many of them are regularly reading my blog. For example, some of our clients are going to start a blog and use us for blog design, blog consulting, etc.
  • I’ve gotten links from popular bloggers, like Robert Scoble of Microsoft. It’s much more difficult to get a mention from Scoble (or other prominent bloggers) if you’re not a blogger. Scoble’s blog, called Scobleizer, is one of the most well-linked blogs on the Internet. Some bloggers have even included me on their blogroll, like Toby Bloomberg of Diva Marketing Blog (Thanks, Toby!)
  • It’s helped me with recruiting panelists for Thoughts Leaders Summits that I organized and moderated for MarketingProfs. For example, the lineup of panelists for one of the recent summits included Internet marketing gurus: Seth Godin, Doc Searls, Robert Scoble, Steve Rubel, and Debbie Weil. My blog played a role in establishing my credibility with them and getting them to respond to my “cold call” email message.
  • Blogs are also great for SEO (search engine optimization). Links are important to the search engines, and the blogosphere is richly interlinked with bloggers linking so much to each other. Blogs are also rich in content, which search engines also like. If I blog about RSS and SEO (which I have), for example, next thing I know I’m #1 in Google for [rss and seo].
  • I’ve also built some great business relationships with other respected bloggers. They have referred business to me, shared speaking opportunities with me, etc.

I had yet another experience with that last item, just today in fact. I’m speaking at the Frost & Sullivan Sales and Marketing East conference in Boston, and a fellow blogger from a competing SEO firm who was sitting at the table I was facilitating earlier today on blogging very kindly publicly commended my blog to the rest of the group for its content and thought leadership. (Thanks Stephen!) There’s a guy who understands the benefits of coopetition (rather than competition)!

The journalist also wanted to know how my blog’s traffic had grown over time. Here are the charts I shared with him showing the growth trends in pageviews and visitors:

Pageviews:

Visitors:

A pretty respectable trend, I’d say. If you’re curious what the actual numbers are, I will give you a hint and say that the both charts measure into the tens of thousands of visitors per month. Hopefully the trend will continue.

One thing I really need to do to keep the numbers heading northward is to blog more frequently. I’m sure traffic growth will accelerate once I do. I just need to buckle down! I guess I’ll just sleep less… (sigh). You other bloggers out there know what I’m saying here, don’t you! More often than we’d like, it’s the wee hours when we’re blogging.

How might a blog pay off for you? For some general ideas, read this article of mine, on blogging, published in last month’s issue of Multichannel Merchant magazine.

Spread the word: delicious this:How blogging has paid off digg this:How blogging has paid off spurl this:How blogging has paid off furl this:How blogging has paid off reddit this:How blogging has paid off Add to Y!:How blogging has paid off

Control your RSS URLs; the right way to move to and away from Feedburner

June 28th, 2005

by Stephan Spencer

I’m guest blogging over at Problogger.net, and my recent post Are you letting Feedburner hold you hostage? generated some interesting discussion, including several comments from Feedburner itself. In fact, Eric Lunt from Feedburner formulated a thoughtful response within his own blog.

To summarize my points: Don’t publish to the world an RSS feed URL that you don’t own. I see it as no different from handing out thousands of business cards with an @earthlink.net address proudly printed on it — rather than one @ your own domain name. Cuz then, you’re married to Earthlink (or in the case of your RSS feed… Feedburner). If you switched services, your existing subscribers would all need to update their feed URLs in their news readers. And what’s the likelihood of that happening! I suggest, instead, one of the following two options:

  • Use a URL from your own domain then having your webserver redirect everyone to whatever your feeds.feedburner.com/[your-feed-here] URL. I found that some newsreaders (like NetNewsWire) choke on a “301″ permanent redirect, so for the time being you should stick with the standard “302″ (temporary) redirect, even though a 301 would be ideal from a SEO standpoint.
  • Alternatively, you could set up a DNS entry of feeds.yourdomainname.com (or whatever it is) to be an alias (a “CNAME”) to feeds.feedburner.com. Then, if you switch from Feedburner, you’d update the CNAME to point to the hostname of the new service. Note that the rest of the URL has to match exactly. I’ve set up my feed to work at http://feeds.stephanspencer.com/scatterings. (Note that this only works if you’re paying Feedburner Pro subscriber.)

This then got me thinking about moving to, rather than away from, Feedburner. Feedburner is a great service — particularly their Pro version. It has a lot to offer in the way of tracking subscribers, clickthroughs, and so forth. If you already have people subscribing to your RSS feed and you want to start using Feedburner, then you’ll need a way to drive those pre-existing subscribers to your Feedburner version of your feed. The way I’d suggest you do this is through a 302 redirect from your old feed URL to your new Feedburner feed URL, ideally with your domain name in the URL (using the above-mentioned CNAME approach).

Spread the word: delicious this:Control your RSS URLs; the right way to move to and away from Feedburner digg this:Control your RSS URLs; the right way to move to and away from Feedburner spurl this:Control your RSS URLs; the right way to move to and away from Feedburner furl this:Control your RSS URLs; the right way to move to and away from Feedburner reddit this:Control your RSS URLs; the right way to move to and away from Feedburner Add to Y!:Control your RSS URLs; the right way to move to and away from Feedburner

When will major search engines start indexing RSS feeds properly?

June 17th, 2005

by Stephan Spencer

I find it a bit unbelievable that the major search engines — Google, Yahoo!, MSN Search, and Ask Jeeves — still don’t offer RSS feed searching combined with RSS search results feeds as part of their Web search. Specialized RSS feed search engines like Feedster, PubSub and Technorati have risen to the occasion, filling the void left by the major engines’ inaction. Bloglines, the AskJeeves-owned company, has announced a blog/RSS search engine service that’ll compete with Feedster, PubSub, and Technorati, but still that’s a far cry from embedding RSS search right into the Web search box.

Here’s how each of the majors handles RSS feeds:

Google:
screenshot of search listing of an RSS feed in Google
another screenshot of search listing of an RSS feed in Google

  • has URLs of valid RSS feeds in its index (due to links that point to those feeds)
  • doesn’t recognize the XML file format of RSS feeds (as you can read on the excerpted screenshots above)
  • only rarely indexes the feed (I base that not just on the fact that nearly all RSS feeds are shown in Google results with no title or snippet as in the first screenshot above, but also because, out of 64,000 RSS feed files hosted by feeds.feedburner.com, only 19 are shown to contain the word cheese, the last 2 of which show up in the results only because cheese appears in links pointing to the feed; yet the same search on Yahoo! shows over 400. So clearly a lot of files that should have matched are missing from the Google search results.)
  • only rarely caches the XML (see example) with most caches being blank (like this)
  • associates words in links pointing to the page (as demonstrated with this search)
  • doesn’t allow refining of your query with the operators ?Į filetype:rss, filetype:xml, or filetype:rdf

Yahoo:
screenshot of search listing of an RSS feed in Yahoo!

  • has URLs of valid RSS feeds in its index
  • indexes the feed (Evidenced by above screenshot, which was a match for a search on text contained within the feed. Also, ResearchBuzz found this to be the case too.)
  • caches the XML (see example)
  • doesn’t display the “Add to My Yahoo!” link for RSS feed listings (this is a disappointing omission, as Yahoo! displays this link on listings for HTML pages that have an associated RSS feed but not for the listing of the RSS feed itself)
  • associates words in links pointing to the page
  • doesn’t allow refining of your query with the operators ?Į filetype:rss, filetype:xml, or filetype:rdf

MSN Search:

  • doesn’t have URLs of valid RSS feeds in its index (Evidence of this: not a single feed out of 64,000 feeds at feeds.feedburner.com is displayed, even though there are links that point to those feeds. Note that the couple feeds that are displayed are not valid feeds but error pages outputted in HTML.)
  • doesn?Ĵt recognize the XML file format of RSS feeds (file type is displayed in the search listing after Cached link when it’s a recognized non-HTML file type)
  • doesn’t index the feed
  • doesn’t cache the XML
  • doesn’t allow refining of your query with the operators ?Į filetype:rss, filetype:xml, or filetype:rdf

Teoma (Ask Jeeves):
screenshot of search listing of an RSS feed in Teoma

  • has URLs of valid RSS feeds in its index
  • indexes the feed
  • (View Cached feature not supported by Teoma)
  • associates words in links pointing to the page
  • (filetype: operator not supported by Teoma)

As you can see from my little comparison, MSN Search is the farthest behind when it comes to RSS feed indexing. Hopefully Scoble will read this and tell the MSN Search team to get on the ball. ;-)

Even though the major engines have been slow to make RSS an integral part of their indices, I predict that the engines will, within the next year or so, wake from their slumber and overtake and even acquire their specialized RSS feed search engine competitors.

What that will mean for web marketers is that search engine optimizing RSS feeds will become a science unto itself (currently it’s limited mainly to optimizing the item titles for purposes of link text on syndicating sites) and that the feeds that are not optimized will get drowned out by those that are.

Spread the word: delicious this:When will major search engines start indexing RSS feeds properly? digg this:When will major search engines start indexing RSS feeds properly? spurl this:When will major search engines start indexing RSS feeds properly? furl this:When will major search engines start indexing RSS feeds properly? reddit this:When will major search engines start indexing RSS feeds properly? Add to Y!:When will major search engines start indexing RSS feeds properly?

What’s wrong with Google Sitemaps

June 6th, 2005

by Stephan Spencer

Last Friday it seemed like the whole blogosphere was abuzz with the news that Google unveiled its new Google Sitemaps service, a free inclusion service where you publish an XML file of your site pages to Google so its spider can get a better sense of what to crawl of your site. This is good news, especially for dynamic sites that aren’t getting fully indexed. I appreciate Google once again showing its thought leadership. Not only is Google giving webmasters a new way to relay information about their site structure information to its spiders, but it’s sharing this new technology with the other search engines by releasing the protocol and code as open source.

This all sounds wonderful, but there are 2 quite major problems with Google’s approach.

  • First, it doesn’t solve the duplicate pages problem that a great many dynamic sites have. Even the Google Store suffers from this (which I blogged about previously but here’s a more recent example of a Google Store product page being duplicated times in Google’s index). The Google Sitemaps protocol does not provide a way for webmasters to convey which pages are duplicates of other pages. A site that gets crawled incorrectly by Googlebot, due to superfluous or non-essential parameters/flags being included in the URLs of links on the pages, will continue to get crawled incorrectly. An “Official Google Sitemaps Team Member” states that the sitemap XML file will merely augment their crawl, it won’t replace existing pages in the index:

    This program is a complement to, not a replacement of, the regular crawl. The benefit of Sitemaps is two fold:
    – For links we already know about thro our regular spidering, we plan to use the metadata you supply (e.g., lastmod date, changefreq, etc.) to improve how we crawl your site.
    – For the links we dont know about, we plan to use the additional links you supply, to increase our crawl coverage.

    The high-level Google engineer who goes by GoogleGuy in the online forums explains Google Sitemaps in this way:

    Imagine if you have pages A, B, and C on your site. We find pages A and B through our normal web crawl of your links. Then you build a sitemap and list the pages B and C. Now there’s a chance (but not a promise) that we’ll crawl page C. We won’t drop page A just because you didn’t list it in your sitemap. And just because you listed a page that we didn’t know about doesn’t guarantee that we’ll crawl it. But if for some reason we didn’t see any links to C, or maybe we knew about page C but the url was rejected for having too many parameters or some other reason, now there’s a chance that we’ll crawl that page C.

    So, the way I read GoogleGuy’s explanation, if pages A and C are essentially duplicates of each other, with A containing an additional superfluous parameter in its URL (like sortby=default or lang=english), then BOTH could end up in Google’s index. Thus, Google Sitemaps won’t reduce the amount of duplication in Google’s index; in fact, I believe it will increase it.

    Duplicate pages, on its own, may not sound like a problem for webmasters as much as it is for Google itself, which has to dedicate additional resources to maintain all this redundant content in its index. However, it does have serious implications for webmasters, because it results in PageRank dilution ?Į where multiple versions of a page split up the “votes” (links) and PageRank score that a single version of the page would aggregate.

  • This brings me to the second, related problem with Google Sitemaps: it doesn’t do anything to alleviate the phenomenon of PageRank dilution. PageRank dilution results in lower PageRank, which in turn results in lower rankings. For example, consider that the above-mentioned Google Store’s product page (the “Black is Back T-Shirt”) is in Google’s index 5 times instead of just once. So each of those 5 variations earns only a fraction of the total potential PageRank score that it could have earned if all the links pointed to a single “Black is Back T-Shirt” page.Google Sitemaps needs to provide a way to convey, or to sync up with, the site’s hierarchical internal linking structure, so that it’s clear which pages should get how much of a share of the PageRank flowing into the site’s home page. Since the primary holder of PageRank score is the home page (that is, after all, the page that most everyone links to), it’s up to the site’s internal hierarchical linking structure to pass the PageRank of the home page to the rest of the site. As such, a page that is 2 clicks away from the home page will get a much larger share of PageRank score passed on to it from the home page, versus a page that is 5 clicks away from the home page.

Here’s how I suggest both of the above issues be rectified: by extending robots.txt with some additional directives that specify:

  • which parameter in a dynamic URL is the “key field”
  • which parameter is the product ID and which is the category ID (specifically for online catalogs)
  • which parameters are superfluous or that don’t signficantly vary the content displayed

Armed with this information, Googlebot will be able to not only eliminate duplicate pages but also intelligently choose the most appropriate version to save in its index and then associate with that page the PageRank of ALL versions of the page. The days of session IDs killing a site’s Google visibility would be over! Google admits in its Sitemaps FAQ that session IDs are still a problem even with the advent of Google Sitemaps:

Q: URLs on my site have session IDs in them. Do I need to remove them?

Yes. Including session IDs in URLs may result in incomplete and redundant crawling of your site.

Remember, getting indexed only gets you to the party, it doesn’t mean you’re going to be popular at the party. Google Sitemaps may help you get more pages indexed, but if those pages all have a PageRank score of 0, then what was the point? It’ll be like sitting along the wall the whole time with no one asking you to dance!

GravityStream, our SEO proxy technology (the concept of SEO proxies is explained in my article in Catalog Age last October) deals with PageRank dilution by distilling URLs in links into their lowest common denominator and replacing them on the proxy. We’ve found that, even as Googlebot gets more aggressive at spidering dynamic sites with complex URLs and starts indexing one of our clients’ sites more fully, our proxy still has a major leg-up on the native site that it’s proxying. For example, our GravityStream proxy of PETsMART.com is #1 in Google for “best pet toys”, and yet the corresponding page on the PETsMART.com native site is nowhere in the first 10 pages of results even though it is indexed. Until Google extends Google Sitemaps to deal with PageRank dilution, I’d expect that a GravityStream proxy will still trump a native site, even if it’s using Google Sitemaps. That means that currently, despite Google Sitemaps, GravityStream still plays an important role for online retailers. Nonetheless, it’s my sincere hope that Google takes my feedback on board and reworks their protocol!

Spread the word: delicious this:What's wrong with Google Sitemaps digg this:What's wrong with Google Sitemaps spurl this:What's wrong with Google Sitemaps furl this:What's wrong with Google Sitemaps reddit this:What's wrong with Google Sitemaps Add to Y!:What's wrong with Google Sitemaps

Pages (10): « First ... « 4 5 6 [7] 8 9 10 »


Related tags

and/or
and/or
and/or
and/or

Newsletter

Web marketing virtuoso Stephan Spencer, shares a wealth of emarketing experience and hard-hitting, practical advice in our monthly newsletter. It's full of valuable insights...You should subscribe.








Latest posts
Latest comments


Contact Us

HEADQUARTERS
2820 Walton Commons West, Suite 123
Madison, WI 53718 USA
Phone: (608) 285-6600
Toll-free: 888 207-1109

REGIONAL OFFICE
36 Anzac Rd., Browns Bay
Auckland, New Zealand
Phone: (+64) 9 476-4601
infodesk@netconcepts.com